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Abstract. We consider an estimator /3„(t) defined as tlie element </> G $ minimizing a 
contrast process A„(</>, t) for eacii t. We give some general results for deriving the weak 
convergence of v^(/3^ — (3) in the space of bounded functions, where, for each t, /3(t) 
is the G $ minimizing the limit of A„(0,t) as n ^ oo. These results are applied in 
the context of penalized M-estimation, that is, when A„(0, t) = M„(0) +t J„(</)), where 
Mn is a usual contrast process and J„ a penalty such as the norm or the squared norm. 
The function /3„ is then called a regularization pa th. For instance we sh ow that the central 
limit theorem established for the lasso estimator in iKnight and Ful l200dl continues to hold 
in a functional sense for the regularization path. Other examples include various possible 
contrast processes for Af„ such as those considered in iPollard I.1985i1 . 

1. Introduction 

Let us consider a real-valued contrast process {M„(<^), cf) G based on an observed 
sample of size n and a contrast function M defined on the same parameter set $ and min- 
imized at the point /3. A penalized estimator with penalty weight t > is defined as the 
minimizer of the contrast process 

A„(0,t) = M„(0)+t J„(0), 0G*, (1) 

where J„ is a non-negative function defined on not depending on the observations but 
possibly on n, mainly to allow some appropriate normalization. 

The use of penalties is popular for ill-posed probl ems and model sele ction, amoiig whic h 
the ridge regression (see Hoerl and Kennard 1 197Clll ) and the lasso (see Tibshirani 1 1996 1) 



are emblematic examples. In these two examples the contrast process M„ is the least-square 
criterion and the penalty function J„ is the squared £^ nor m and the norm, respectively. 



Consistency and central limit theorems are established in iKnight and F u [2000] precisely 
in the case where is the least-square criterion and is in a family of penalties in- 
cluding both the squared norm and the norm. They show that, when the penalty 
is properly normalized, the penalized mean square estimator is no longer asymptotically 
normal. Instead, its asymptotic distribution is given b y the minimizer of a p enalized qua- 



dratic form depending on a Gaussian vector (see e.g. llKnight and Fu , I2OOOL Theorem 2]) 
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Their asymptotic results iiold as tiie number n of observations tends to infinity and for a 
fixed finite-dimensional model. Quite different results have been established when the di- 
mensi o n of the model incr e ases with n, see G reenshtein and Ritov [2004], Zhao and 

iBunea et all 1120071] . iBickel et all 11200811 and the references therein. These results 
provide interesting properties of the lasso for model selection or prediction purposes in the 
context of sparse mode ls. Although sp ecific normalizations of the penalty (different from 
those required in Knig ht and Ful 120001] ') are prescribed in these theoretical results, there ex- 
ist numerous heuristic ways for choosing the penalty weight t in practice. The first step 
is to minimize A„(0, t) in ([T]l on G $ for a collection of non-negative weights t, re- 
sulting in a collection of estimators /3„(t), which is called the regularizatio n path (or t he 



solution pa t h). Th e Least Angle Regression (LAR) technique introduced by lEfron et al. 



in 



lEfron et all 120041] provides, in most cases, the entire path, computed with the c omplexity 



of a linear regression. In a second step, some criterion is used to select t, see e.g. IZou et al. 



Jioov] where AIC and BIC procedures are proposed for the lasso. Because the whole path 
is used by the practitioner, we thi nk that it is cru c ial to examine whether the convergence 
of ^/n{(3n{t) — /3), established in Knight and Ful 1200 01 for one fixed t, continues to hold 
in a functional sense and, if it is the case, to determine the limit distribution. The goal of 
this p aper is twofold. First we show that, under the same assumptions as in Knight and F^ 
i2000[] . the convergence holds in the space of locally bounded functions. Second we extend 
this result to more general contrast processes M„ such as generalized linear models (GLIM) 
or least amplitude deviation (LAD). A key result is a pathwise argmin theorem which es- 
tablishes the functional weak convergence of a path defined as the minimizer a collection of 
contrast processes, see Theorem [3] 

For the moment let us give the asymptotic behavior of the lasso regularization path, which 
is the most simple application of our results and which naturally extends .Knight and f3 
i2000l] . Consider the linear model 



yfc = Xfc/3 + efc, A; = 1,2,... (2) 

where (3 ^ W is an unknown parameter, (yk) is a sequence of real- valued observations, 
(xfc) is the sequence of regression vectors and (sk) is a strong white noise with variance cj^. 
For any t > 0, the lasso estimator /3„(t) minimizes the penalized contrast process A„((/), t) 
on G MP, where 

^ n p 
A„((i),t) = -^(yfc-x^0)2 + tA„^|(/.i| , (3) 



k 



n 



k=l 



which is a specific form of ([T]). Denote X„ = [xi, ...,x„]^. We consider the following 
assumptions, for consistency and ce ntral limit theorem, respectively. The assumptions are 
the same as in lKnight and Fiil i2000l] . 



Assumption 1. 

(i) Cn = n-iX^X, 

(ii) A„ ^ 0. 



C, where C is a positive-definite matrix; 



Assumption 2. 

(i) Assumption [B© holds; 

(ii) maxi<fc<„ ||xfe|p = o(n); 
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(iii) A„ = n-V2. 

Assumptions [T]-(Iil) and!!]-© are the classical assumptions for the asymptotic behavior of 
least squares estimators. The other assumptions provide the appropriate way of normalizing 
the penalty. 

Theorem 1. Under Assumption\l\ Pni^) converges in probability to f3 locally uniformly in 
t G M+, that is 

P,,^f3 in£^{R+,W) , (4) 
where i^{M-^-,MP) denotes the space of locally bounded M-|- MP functions. 

We now define the limit process of the lasso regularization path, appropriately centered 
and normalized. Let U AA(0, cr^C). For any t > 0, we define u{t) as the point G 
which minimizes 



L(0, t) = -2U'^4> + 4>'^Cct) + t 



p 



'^j sgn iPj) l{/3,^o} + \^j\t{f3,=o} 



(5) 



It is easy to show that this defines u{t) uniquely for all t > (see the proof of Theo- 
rem O. The distribution of u as a function is not explicit but is not more complicated than 
its marginal distributions already described in Knight and Fu [2000], since the whole path 
is described as a deterministic function of the random variable (r.v.) U. An interesting 
property of 2(t) is that, with probability 1, the set of its components that vanish for t large 
enough is given by the set of zero components of the true parameter (3. 

Theorem 2. Under Assumption |2] 

V^0n - /9) - 2 in C (K+ , , (6) 
where denotes the weak convergence. 

Remark 1. The convergence in ^^(M+jM^) is equivalent to the uniform convergence on 
every compact subset of M_|_. In fact the convergences (HJl and Q cannot be improved in the 
sense that they do not hold uniformly on M+. To see why, observe that, by the definition of 
u, its coordinates corresponding to non- vanishing f3j are unbounded as t — > oo. In contrast, 
the left-hand side of ^ is bounded since, for any n, there is a large enough t for which 
^^(t) = 0. Note that this also imphes that suptg^^ ||^„(t) - /3|| > and thus that the 
consistence dUl does not hold if ^^(R+, M^) is replaced by the set of bounded M+ MP 
functions £°°(]R+, M^) endowed with the sup norm. 

The proofs of Theorem [T] and Theorem [2] are applications of some general results on the 
consistency of convex penalized M-estimators and on the weak convergence of Argmin's 
depending on an tuning parameter t (the so called pathwise argmin theorem in the follow- 
ing). More general penalized contrasts will also be considered. Such extensions are of 
interest since the lasso regularization p ath has been extended t o the case where M„ is dif- 
ferent from the least-square criterion. In lPark and Hastie 1 2007 1. a fast numerical algorithm 



is proposed for determining the regularization path when M„ is a regression function based 
on a negated log-likelihood of the canonical exponential family. In iGermainI ll2007ll . a fast 
algorithm based on a dichotomy is proposed to explore the range of t's in the specific case 
of logistic regression penalized by the £^ norm. 
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The paper is organized as follows. In Section |2l we provide a pathwise argmin theorem 
(Theorem O. Section[3]is concerned with the asymptotic behavior of the regularization path 
of a penalized contrast. Very mild conditions on the contrast and on the penalty are provided 
for obtaining the uniform consistency and the central limit of the path and a particular atten- 
tion is given to the case where both the contrast and the penalty are convex. Except for the 
convex case, such results can actually be seen as special cases of the more general study of 
pathwise M-estimators, which is treated in Section HI Finally we provide several examples 
of applications of the se results in Section [51 including the £^ -penalized general linear model 
(GLM) introduced in lPark and Hastiel 120071. the penalized least absolute deviation (LAD) 
and the Akaike information criterion. Concluding remarks are provided in Section [6l The 
detailed proofs are deferred to the appendix for convenience. 



2. A PATHWISE ARGMIN THEOREM 



To obtain a CLT for the regularization path, we rely on a pathwise argmin theorem, 
which is of independent inter est, and can be seen as an exten sion of [Kim and Pollar^, 
1990L Theorem 2.7] (see also jVan der Vaart and Wellnel ll996L Theorem 3.2.2]) to fit the 



context of a path defined as the minimizer of a collection of cont rast processes 



Let us recall some of the terminology and notation used in IVan der Vaart and Wellner 



lll996n . For a metric space V, we say that a sequence of P-valued maps (Xn) defined on Q 
converges weakly to a P-valued map X defined on (Q, T), and denote Xn X, if X is a 
Borel map and, for any real-valued bounded continuous function / defined on V, 

E*[f{Xr.)]^E[f{X)], 

where E denotes the expectation with respect to P and E* denotes the outer expectation, 
defined for every real- valued map Z defined on Q by E* [Z] = inf {i?[{7] : U > Z}, where, 
in this sup, the r.v. U is taken measurable. The inner expectation and inner probability are 
respectively defined by E^[Z] = -E*[-Z] and P^{A) = 1 - where A"" denotes 

the complementary set of A in $7. 

For any positive integer p and any set T we further denote by i^{T,W^) the normed 
space of bounded functions / = (/i, . . . , /p) taking values in MP and defined on T endowed 
with the sup norm on T, denoted by 

II/IIt = sup \m\ . 

tGT,iG{l,.--,p} 

We will simply denote ^°°(T, M^) by ^~(T) for p = 1. 

Theorem 3. Let ^ be a metric space endowed with a metric d and T be an arbitrary set. 
We suppose that we are in one of the two following cases 

( C-1 ) T is a finite set. In this case, we set D = endowed with the product topology; 
(C-2) $ = with p > 1, d being the Euclidean metric. In this case, we set D = 

Let {L„(0, t), (/) G $,t G T} be a sequence of real-valued processes, {L((^,t), cf) € 
$,t € be a real-valued process, {2(t), t G T} be a ^-valued process, and{Un{t), t G 
T} be a sequence of ^-valued processes. Assume that 

( i) for any compact set C L„ L in (K x T) and L is a tight Borel map taking 
values in e°°{K x T); 
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( ii) for any r] > and compact K C ^, we have almost surely that 

inf [inf{L(0, t) : (f) e K, dicf), u(t)) > r?} - L(u(t), t)] > ; (7) 

( Hi) for any e > 0, there exists a compact K C ^ such that 

P{u{t)£Kforallt£T)>l-e; (8) 

(iv) for any e > 0, there exists a compact K C ^ such that 

liminf (2„(t) G K for all t G T) > 1 - e ; (9) 

(v) Un is approximately minimizing L„, 



sup<^L„(2„(t),t) - inf L„((/),t) ^ =op*{l). (10) 
tGT L '^e* J _^ 

Then there is a version of u in D and Un u. 

Proof of Theorem\^in the case fC-[7]). In the case (C-dJ, where T is finite, for any compact 
K d is a compact subset of endowed with the metric 

dT(u,v) = sup(i(u(t), v(t)) . 
teT 

Hence, in this case. Conditions (lull ) and respectively say that u is tight and is 
uniformly tight in In this case, the conclusion of T heorem [3] follows almost directly 
from Theorem 3.2.2 in der Vaart and Wellned lll996ll . To see why, let us introduce the 
following contrast process 

£„(v) =sup|L„(v(t),t) - inf L„(0,t)| , v e ^'^ . (11) 
tGT L <^e* J 

Observe that defining Un{t) as a minimizer of L„(-, t) for all t G T is equivalent to defining 
Un directly as a minimizer of £„. In particular Condition (0 implies that 

£n(2n) < inf £n(v) +Op.(l) , 

that is, Un is a near minimizer of £„. Condition (0), in turn, by the continuous mapping 
theorem, implies that iZ^ £ in ^°°(^^), where, for any v G 



£(v) = sup <^ L„(v(t), t) - inf L„((^, 
tGT L 0e* 

Finally it is not too difficult to show that Condition ^ implies that, almost surely, for all 
compact K C ^ and > 0, 

inf {£(v) : v £ , driv, u) > t]} > = C{u) . 

This condition correspo nds to the sem icontinuity and argmax uniqueness conditions appear- 
ing in Theorem 3.2.2 in I Van der Vaart and Wellner. [1996.1 . Hence this theorem applies and 
yields Un uin the case (C-dJ. □ 

The proof in the case (C-llJl is postponed to the appen dix. The main originality of Theo- 
rem|3]lies in the case (C-|2ll. In this case. Theorem 3.2.2 in I Van der Vaart and Wellnerl 11 199611 
cannot be directly applied because Condition (jivl) is no longer a uniform tightness condi- 
tion {K^ is not a compact subset of i°°{T, M^)). The key idea, detailed in the appendix. 
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is to show that, under Conditions ©-(E]), this asymptotic tightness of w„ in £°°{T,MP) is 
inherited from that of L„ assumed in Condition 

3. Penalized M-estimation 



3.1. Uniform consistency. Standard results on the consistency of M-estimators (see e.g. iVan der Vaart , 



19981. Theorem 5.7]) roughly say that if /3„ is a sequence of minimizers of M„ on M„ 
tends to M with some uniformity and P is an isolated minimum of Af on then /3„ con- 
verges to (3 in probability. We will use the following set of conditions which are slightly 
weaker than the classical ones. 



(i) sup {M{cj)) — Mn{4>)}, — > 0, where a+ = max(0, a) for any a G 



Assumption 3. There exists /3 G $ such that 

sup 

(ii) MM M(/3); 

(iii) for all e > 0, inf{M(0) : G *, (i(0,/3) > e} > M{(3), 
where dis a. metric endowing the metric space ^. 

Let us briefly comment these assumptions. Conditions ^ and ^ are generally replaced 

by the stronger uniform convergence condition sup^g^ \M{cf)) — Mn{4>)\ — > 0. These 
weaker conditions are for instance useful when $ is non-compact since it is then sufficient 
to show the uniform convergence on a compact subset and provide a lower bound of Af„ out 
of this compact. Condition (Iml ) is the standard condition which defines P as the (unique) 
isolated minimum of the limit contrast function. 

We will show that, under Assumption |3l provided that Jn(/3) tends to 0, the minimizer 
/3„(t) of An{4>, t) converges to /3(t), locally uniformly in t. To avoid making measurability 
assumptions on the path t Pni^), we need to work with outer probability to extend the 
probability to possibly non-measurable sets. Given a probability space (0, J^, P), we denote 
by P* the outer probability defined on the subsets of Q, by 

P*{A) = inf{P(S) : B G JTwith A C B}, ACQ . 

We say that a sequence (1^) of real-valued maps defined on Q, converges in P*-probability 

to and denote y„ ^ if, for any e > 0, P*i{\Yn\ > e}) 0. Here {|y„| > e} is the 
usual short-hand notation for the subset {lu £ Q, : |y„(w)| > e}. When y„ is measurable 
as a map taking values in M endowed with the Borel fi-field, this is equivalent to the usual 
convergence in probability. 

Tiieorem 4. Suppose that Assumption \3\ holds for some /3 G M defined on $ and 
{Mn{cf)), cf) G $}, a sequence of real-valued processes. Let ( Jn) be a sequence of non- 
negative functions defined on $ such that JniP) 0. Let T be a compact subset of [0, oo) 
and suppose that we have a ^-valued process {/3„(t), t > 0} such that 



sup 
teT 



{A„(^„(t),t)-A„(/3,t)} ^0, (12) 



where A„ is defined by (|7|). Then /3„(t) converges to (3 uniformly in tGT, in P*— 
probability, that is, 

supd(3n(t),/3) ^0. (13) 

tGT 
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Remark 2. In statistical applications the contrast function M in Assumption [3] depends on 
the unknown distribution of the contrast process M„ and thus (5 is an unknown point of 
In particular, the convergence condition Jn{P) has to be verified for any /3 G $ (but 
not uniformly in (3) and it simply amounts to correctly normalize the penalty J„ as n — > cxd. 



Remark 3. The same result holds if the convergence in P-probability in Assumption [S]-© 
is replaced by a convergence in P* -probability. However, in applications, the smoothness 
properties of cj) Af„($) and cf) i-^ M($) usually imply that sup^^^{AI{cj)) — Af„(0)}_|_ 
is a measurable function. 

Remark 4. The fact that the outer probability P* appears in ([12] ) does not bring real diffi- 
culties in applications. Indeed Condition ([T2l ) follows from the definition of /3„(t) as a near 
minimizer of A„(-, t), that is, if /9„(t) satisfies 

A„(3„(t),t) < inf A„(0;t) + n„ , 

with Un = op{l) not depending on t, e.g. ti„ = (perfect minimizer) or u„ = 
(near minimizer). The numerical computation of a near minimizer is a difficult task in 
general, in particular in the presence of several local minima. We will focus on convexity 
assumptions in Section 13.21 which cover many cases of interest and which usually allow 
tractable numerical computation of /9„(t) for any t. 

Remark 5. Although /9„(t) is an r.v. for any t, the map sup^gx ll/3n(^) ~ f^W defined on Q 
may not be measurable (it is in some particular cases, for instance if the map t h->- /3„(t) is 
continuous). This is where the outer probability is useful. Nevertheless, for any t > 0, the 
event {d(/3„(t), /3) > e} is measurable, and its probability is less than the left-hand side of 
Eq. ([I3]); hence, for any t > 0, 3„(t) /3(t). 

Remark 6. For L = in ([T3] ). we get a standard result on the consistency of M-estimators 
(without penalty). It is important to notice that the consistency of penalized M-estimators 
is obtained for free, in the sense that no additional assumption on M„ or M is required and 
the only assumption on J„ is Jn{f^) 0. 

3.2. Uniform consistency in the convex case. In this section, we consider the following 
assumption. 

Assumption 4 (convexity assumption). $ is a convex subset of an Euclidean space en- 
dowed with the norm || • || and M„ is a convex real-valued function on $ almost surely. Let 
y C $ be a neighborhood of the point (3 and A be a strictly convex real-valued function 
defined on V such that 

(i) for any E F, M„(<^) A(0); 

(ii) A(0) > A(/3) for all cj)£V. 

Convex M-estimation is considered in Haberman il989ll and somewhat simplified in 
Niemirol il9921. In the following result the convexity assumption is twofold. First it implies 



Assumption [3] Second, if the contrast the penalty J„ is strictly convex, then the minimiza- 
tion of ([T]) has a unique solution and this solution path is continuous, which allows to replace 
the outer probability in (fT3l) by a standard probability. Convexity is also useful in practice 
since /3„(t) can be computed using efficient numerical procedure for convex optimization 
(see , Boyd and Vandenberghe 1.20041 ). 
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Theorem 5. Suppose that Assumption^ holds. Let {Jn) be a sequence of non-negative 
functions defined on $ such that Jn{(3) and define A„ as in ([7]). Then the 3 following 
assertions hold. 

(a) For any L > 0, if we have a ^-valued process {/3„(t), t > 0} satisfying di2D , /3,„(t) 
converges to (3 uniformly in t E [0, L], in P*— probability, that is, ([13\l holds. 

(b) If Jn is strictly convex on then it is always possible to define a deterministic non- 
negative sequence (Ln) with L„ ^ oo, a sequence {An) of events in with P(An) — > 
1, and, for each n, a collection {/3„(t), t > 0} of r.v.'s satisfying the two following 
properties. 

For all t G [0, L„] and lo £ An, A„(/3„(a;, t), t) is a minimum of An {(f), t) on 

(f) £ ^ and this minimum is unique for t > 0. 
dS^j For all uj £ /3„(i^, •) is a continuous function on (0, L„] and on {Ln, oo). 
As consequences, di2l) holds for any L > and the uniform convergence ([13\l holds in 
P— probability, that is, 

sup ||3„(t)-/3|| ^0. (14) 

tG[0,L] 

(c) If Mn is strictly convex on ^ for all n, then the conclusions o/(0 hold with Proper- 
ties and (ib2i strengthened as follows. 

(|c]/j For all t G [0, Ln] and lo E An, An(/9„(a;, t), t) is the unique minimum of 
Kn{4>, t) on 4> £ 

(tcpj For all uj £^, I3,^{lv, •) is a continuous function on [0, L„] and on {Ln, oo). 

Remark 1. The proof of Assertion © is somewhat simpler than Assertion ©. However, in 
some cases, the first purpose of the penalt y J„, is precisel y to solve an ill-posed problem such 
as in the ridge regression (see .Hoerl and Kennard 1 197Clll ') where M„ ((f)) = X]fc(yfc— x^0)^. 



Jn{(f>) oc ||<^|P and the regression matrix X„ = [xi ... x„]-'" is not full rank. Thus Jn is 
strictly convex and M„ is not, in which case Assertion © can be useful. 

3.3. Functional central limit theorem. Some general conditions for proving ^/n asymp- 
totic normalit y for M-estima tors rely on the so called stochastic differentiability condition 
introduced in PoUardl il985 1. They exploit the idea introduced in Huber 1.1967.1 of using 
strong differentiabihty conditions on the limi t cont rast function rather than on the contrast 



process. Moreover it is explained in IPoUardl il985h how the empirical process theory can 



be used to prove the stochastic differentiability condition. Extensions of these ideas can be 



found in 



Van der Vaart a nd Wellner [1996] 



In lPollardI il985h . Pollard proves the asymptotic normality of M-estimators based 



on a 



contrast process of the form 

n 

Mn{(f)) = n-'J29i^k,(f)) = Png{;(l)) , (15) 

k=l 

where {^k) is a sequence of ^Y- valued random variables and g is a X xW function satisfying 
the following Taylor expansion around a given point (3 £ M^, 

g{x, (f)) = g{x, (3) + {(t)- /3)^A(x) + ||0 - (3\\ r {x,(f)) (16) 

We will show that if the ^/n asymptotic normality conditions in I PollardI il985 tl are veri- 



fied and if the penalty satisfies mild asymptotic conditions then the penalized version of the 
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M-estimator satisfies a CLT similar to the CLT in lKnight and Ful 1200011 for the mean square 
criterion. Moreover this CLT applies to the regularization path in a functional sense. 

Let us recall Pollard's conditions that we will use on the contrast process M„ defined 
by ^ and 

(P-1) (^fc) is a sequence of i.i.d. random variables with distribution P; 

(P-2) the function M{cf)) = Pg{-, 0) has a nonsingular second derivative F at /3 G W; 

(P-3) P||Af < oo and PA = 0; 

(P-4) the stochastic differentiability condition holds on r, that is, for any sequence of posi- 
five r.v. (r„) such that r„ — > 0, 

\vnr{-,4))\ 



sup 

\\4>-f3\\<rn 



1 + ^||0_/3|| 



. 



(17) 



Here we used the notations, standard in the empirical process literature, Pf, Pnf and Vnf 
for f fdP, Ylk=i fi^k) and y/n{Pnf — Pf), respectively. Theorem [6]below provides 
a central limit theorem for the regularization path defined on the penalized contrast ([Hi when 
Mn satisfies Pollard's conditions (P-[T]|-(P-|4]| with some mild conditions on the penalty J„. 

Theorem 6. Let $ = W^, p > 1 and T be a compact subset of [0, cxd). Define A„ as 
in (Ul), where Mn is defined by di5D and satisfies Pollard's conditions (P\l^-(P^ and Jn 
is a sequence of deterministic non-negative functions defined on M^. Further assume that 
there exists a positive constant C such that 

n\Jn{ct>)-Jnm<C{l + V^\\<i)-(3\\) for ||(/) - /3|| < 1 , (18) 

and, for any compact K C M^, 



sup 

4>&K 



n 



(t>) - n JM - Joo{cj>) 



0, 



(19) 



where Joo is: a real-valued function on Let t ^ T} be a sequence of ^-valued 

processes satisfying di2l) and such that the uniform P* -consistency f li il ) holds. Let W be a 
centered Gaussian p-dimensional vector with covariance P{AA'^) and define 

L(0,t) = PF^0 + 0^r0 + tJoo(<^) . (20) 

Finally assume that there exists a ^-valued process {u{t), t G T} such that Conditions ^ 
and /HJH in Theorem\3\hold. Then there is a version ofu in i°°{T, W) and 



V^iPn -P)^u. (21) 

The following lemma shows that the penalties considered in lKnight and Fu i2000l] satisfy 
Conditions and ([79l l. 



Lemma 1. Let 7 > and define, for all 4> 



n 



(lA7)/2-l 



El 

fc=i 



Then for any /3 G W, there exists C > such that, for all cf) G M^, 

jW(0)-jW(/3) <C (l + V^||<^-/3|| + V^||</.-/3f^^) , 



n 



(22) 



(23) 
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and, for any compact K C 



sup 

4>&K 



n JW (/3 + n-i/2<^) - n4^) (/3) - (0) 



where 



-J I l{/3,=0} 



Ei=i{</'jSgn(/3j)l{^^^0} + 



'i|l{/3,=0} 



/f7 < 1 
'77 = 1 



(24) 



(25) 



Remark 8. The limit penalties in (1251) correspond to those in Theorems 2 and 3 in Knight a nd Fu 
Jiooo], exc ept for the multiplicati ve constant 7 in the case 7 > 1, which seems to have been 
forgotten in lKnightand Ful i200(]|l . 



4. Pathwise M-estimation 

It turns out that the specific form of the contrast A„ in ([B is not fundamental for the basic 
arguments yielding the consistency and the CLT in Theorems \4\ and [6l respectively. Here 
we provide results formulated in the more general form where /3„(t) is a near minimizer of 
A„(-, t) for all t G T. Moreover the true parameter (3 itself is defined as a map on T, with 
/3(T) defined as the minimizer of L(-,t) for all t G T. We refer this general situation as 
pathwise M-estimation. 

4. 1. Uniform consistency. Theorem|4]is obtained by applying the following general result 
on pathwise M-estimators. 

Proposition 1. Let ^ be a subset of a metric space endowed with the metric d and T be 
any set. Let Kbe a real-valued function defined on $ x T, {A„(0, t), G t G T} a 
sequence of real-valued processes, /3 a T — > $ map and {/3„(t), t G T} Z^e a sequence 
of ^-valued processes such that 

(i) sup sup {A(0; t) - A„(0, t)}_^_ — > 0; 
teT 

sup|A,(/3(t),t) - A(/3(t),t)| ^ 0; 
teT 

( Hi) For all e > 0, 

inf [inf{A(0;t) : G d(0,/3(t)) > e} - A(/3(t), t)] > ; 

(iv) sup|A„(^„(t),t) -A„(/3(t),t)| ^0. 
teT J + 

Then, /9„(t) converges to (3{t) uniformly in t G T, in P* -probability, that is, 

supd(3„(t),/3(t)) ^0. (26) 
teT 
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4.2. Functional central limit theorem. We now extend the setting of IPoUardl 1198511 to 

pathwise M-estimation. First we obtain the ^n-rate of convergence in the sup norm; second 
we apply Theorem[3]to obtain a functional CLT for pathwise M-estimators. Theorem[6]is a 
direct application of this result in the context of penalized M-estimation. 

Proposition 2. Let ^ be a subset of a metric space endowed with the metric d and T be any 
set. Let {Kn{4>, t), G t G T} Z^e a sequence of real-valued processes, (3 be aT ^ ^ 
map and {/3„(t), t £ T} be a sequence of ^-valued processes such that 

sup|A„(^„(t),t)-A„(/3(t),t)| =Op*{n-^) , (27) 

tGT + 

and the uniform P* -consistency A26i holds. Assume that we have the following decomposi- 
tion of the contrast process, 

A„(<^,t)-A„(/3(t),t) =G„(<^,t) + F(0,t) + d(0,/3(t))i?„(0,t) , (28) 

where Gn, H and Rn satisfy 

(i) {Gn{4>^ t), € t G T} is a sequence of real-valued processes such that 

n |G„(0,t)| 

sup sup = Op* (1) ; (29) 

06* teT 1 + Vnd{(t),/3{t)) 

(ii) H is a real-valued function defined o« $ x T such that there exists e > 0/or which 

{Rn{(f), t), (f) £ £ T} is a sequence of real-valued processes such that, for any 
positive random sequence (rn) converging to in P*— probability, 

supsup{|i?„(0,t)| ; 0e*, d(0,/3(t)) <r„} = op.(r„)+Op.(n-V2) . (31) 
teT 

Then, /3„(t) converges to (3{t) uniformly in tGT, in P*— probability, with rate at least 
^Jn, that is, 

supd(^„(t),/3(t)) = Op,(l) . (32) 
teT 

Applying Proposition [2] and Theorem [3l we get the following result. 

Theorem 7. Let = W, p > I, and T be any set. Let {A„(0, t), 4> £ f^,t £ T} be a 
sequence of real-valued processes, f3 be aT ^ ^ map and {/3„(t), t £ T} be a sequence 
of ^-valued processes such that 

sup|A„(3„(t),t) -A„(/3(t),t)| =op. (n-i), (33) 
teT J + 



and the uniform P* -consistency <\26\l holds. Assume that the decomposition f l28D of the 
contrast process holds where On, H and Rn satisfy: 

(i) {Gn{4>, t), (f) £ £ T} is a sequence of real-valued processes satisfying ([29i : 

(ii) H is a real-valued function defined on $ x T and there exists a function T defined 
on T and taking values in the set of non-negative symmetric p x p matrices such that, 
denoting by Amin(r(t)) and Amax(r(t)) the smallest and largest eigenvalues ofT{t), 

< inf{Ami„(r(t)), t G T} < sup{Amax(r(t)), t G T} < 00 , (34) 
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and, ascj}^ (3 in £°°(T, W), 

•) - (0 - /9f r(<^ - /3)||^ = o (110 - (3\\\) ■ (35) 

fn/j {Rn{(j}, t), G t G T} /i' a sequence of real-valued processes such that, for any 
positive random sequence (r„) converging to in P* -probability, 

supsup{|i?„((/),t)| d(0,/3(t))<r4 = op.(r„) + op.(n-i/2). (36) 

teT 

Let us further define 

G„(0,t) = nG„(/3(t)+n-i/20,t) , (37) 

and assume that there exists a real-valued process {G(<^,t), (f) £ t G T} such that, 
for any compact K C ^, G is tight in 1°°{K x T, W) and G„ G in i°°{K x T, W). 
Define 

L(<^,t) = G(0,t) + 0^r(t)0, (38) 

and assume that there exists a ^-valued process {2(t), t G T} such that Conditions (|j7|) 
and f llnl ) in Theorem\3\hold. Then there is a version ofu in i°°{T, W) and 

V^(3n -I3)^u. (39) 

Remark 9. Observe that Eq. (l33l) is a strengthened version of (l32l) and that (l34l) and (1351 ) 
imply (l30l ). Hence Conditions in Theorem |7] imply Conditions in Proposi- 

tion El 



5. Examples 

The uniform consistency and a functional central limit theorem for the lasso regulariza- 
tion path are respectively given in Theorems [T] and |2l Theorems [5] and [6] allow many exten- 
sions, some examples of which are given in this section. In Pollard [1985], a wide variety 
of models and functions g are shown to satisfy Conditions (P-[Tl)-(P-|4ll. These conditions 
ap ply for the gene ral linear model (GLM) as this model satisfies the pointwise assumptions 
of fPollard','l985', Section 4] (provided some moment cond itions). They a lso apply for the 



least absolute deviation (LAD) criterion, see Example 8 in jPoUardl. [19851 Section 6] (pro- 
vided again some moment conditions on the model). We briefly write the corresponding 
results in these two cases as examples of applications of Theorem [6l Uniform consisten- 
cies for both examples are obtained as applications of Theorem [51 since in these cases M„ is 
convex. For these two examples, we consider the and i"^ penalties. They fit the conditions 
of Theorem[6]as they satisfy (fTSl ) and ( [T9l ) by Lemma[T] Observe however that the function 
Joo in Lemma [T] depends on the chosen penalty and thus so does the limit u in (|2T]) . We 
conclude this section with a discussion on the Akaike information criterion (AIC), which 
corresponds to a i'^ penalty. 

5.1. £^-penalized GLM. Consider a canonical exponential family of density 

piy\e) = h{y)exp{ye-b{e)} , 

with respect to a dominating measure /i. The function b, sometimes called the log-repartition 
function, is given by 

6(0) = log J hiy)exp{y9}f,idy) , 
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and thus is strictly convex and infinitely differentiable. In a GLM, one observes a sequence 
of i.i.d. M X RP-valued r.v.'s (y^jX^), k = 1, . . . ,n, where have conditional density 
p(-|x|^/3), given x^, with P ^ W denoting the unknown parameter of interest. In this 
context, the non-penalized contrast process is given by the negated log-likelihood 

n 
k=l 

where ^((x, y),cf)) = —yx^cf) + b{x^(f)). Using that g is convex and smooth, and assuming 
some appropriate moment conditions on xi for obtaining Pollard's conditions (P-[T])-(P- 
lU, we get the uniform consistency and a functional CLT on the regularization path /3,„(t) 
defined as the minimizer of © with Jnicf)) = n-V2 ^p^^ |0.| (this is the f penalty 
defined in (l22l)). In particular, for any L > 0, 

where the limit u is defined as in the lasso case as the minimizer of (O with C = E[6"(x![^/3)xi 
(assumed positive-definit e) and U ^ Af(0 , C). T he numerical computation of /3„(t) can be 
processed as proposed in lPark and Hastiel ll2007ll . 

5.2. £^ and ^^-penalized LAD. Given a sequence of M x M^-valued r.v.'s {yk,'^k), k = 
1, . . . , n, the LAD criterion is defined as 

n 

Mn{^) = n"^ \yk - . 
k=l 

It can be used to estimate the parameter (3 £ MP of a linear regression model yk = x|^/3 + 
Sk, with (e/c) and (x^) two independent sequence of i.i.d. r.v.'s. This contrast process is 
an alternative to the mean square criterion, resulting in an estimator less sensitive to the 
presence of outliers (for x^ = 1, the minimizer of M„ is the sample median). In contrast 
to the previous case, the c ontrast i s not s mooth, since the first derivative is discontinuous. 
However, as shown e.g. in I PoUardl imsh . the minimizer of this contrast is asymptotically 
normal, provided some moment conditions and that 

G((/))=E[|ei+xf(/3 -(/,)!] 

has a non-singular second derivative at = /3. Observe that 

/■xf(<^-/3) 

G{(t}) = E xf (/3 - 0) + 2 / F{s) ds , 

Jo 

where F denotes the cumulative distribution function of ei. Thus, if ei is distributed from 
a continuous density /, the second derivative of G at /3 is F = 2/(0)E [xix^]. Because 
the LAD criterion uses the error function, the i'^ penalty Jn{(t>) = re^^^^ Z]f=i ^1 could 
seem more reasonable. On the contrary Theorem [6] suggests that using an error function 
contrast does not modify the asymptotic distribution of the regularization path, only the 
choice of the penalty does. In other words, the regularization path of the and £^-penalized 
LAD has similar asymptotic distributions as the lasso and the ridge regression, respectively. 
Let us now precise the limit distribution of the regularization path /3„(t) defined as the 
minimizer of ([T]) with Jni(f>) = n^^l'^ Yl\=\ ^"^^ Jni4>) = n^^^"^ Yl'i=i 4>i respectively 
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(these are the and i"^ penalty Jn^ and Jn^ defined in (l22l)). Under appropriate moment 
conditions on (ei, xi) implying Pollard's conditions (P-[T]l-(P-|4ll (in particular E[sgn(ei)] = 
0, E[||xi|p] < oo so that E[A] = 0, E[||A|p] < oo and G is minimized at </> = one has, 
for any L > 0, 

V^(3„-/3)^u in£°°([0,L],MP) , 
where the limit u is defined as the minimizer of (l20l ) where F is the (non-singular) second 
derivative of G at = /3, ~ AA(0, E[xix^]) and J^o depends on the penalty. Namely, 

for the £^ penalty, one has = and for the £^ penalty, one has Jqo = where 
J^^ is defined by 

5.3. Akaike information criterion and the £^ penalty. Consider a parametric family of 
densities {p^ defined on A'" for modelling the distributio n of the obser vations 

^1, . . . , The Akaike information criterion (AIC) was proposed in I Akaikel lll973n as the 
negated log-likelihood criterion penalized by the dimension of the parameter. It can be 
defined (up to a multiplicative factor which does not change its minimizer) as 

AIC(0) =A„(0,1) , 

where A.„ is defined by ([B with Mn{4>) = —n~^ \ogp^{^i, . . . , and 

4°)((/>)=n-i#{A; : 0fc/O} , 

where # A denotes the cardinality of the set A. We note that it corresponds to a pe nalty, 
that is, to 7 = in (|22l) although this case is not considered in iKnight and Fu i2000ll . It is 
not usually assumed that $ is finite-dimensional in the presentation of the AIC. However, 
in practice, the minimization of AIC(0) requires numerically minimizing M„(0) for each 
possible submodel, which corresponds to a given value of the sequence (l((/>fc 7^ 0))fc>i. 
This makes sense only in a finite-dimensional setting, $ C W, with p not too large (say 
p < 15) since 2^ numerical minimizations of M„ are then necessary. 

Observe that, for any fixed <^ G M*' we have nJu"* (0) < p and, for any (3 and any 
r > 0, we have, for n large enough, 

njf (/3 + n-^/20)-njW(/3) = jL°)(<^) for all ||0|| < r , 

where 

p 

^S^(0) = i;i(/3fe = Oand 

k=X 

It follows that the contrast Jn^ satisfies the assumptions ([TSl l and ( fT9l ) in Theorem [6] and 
thus we may apply this result to obtain the limit behavior of the minimizer of the AIC in the 
i.i.d. case, that is, when 

Mn{(t>) = Png{-, 0) with g{x, (f)) = - logpcj,{x) . 

Here, denotes the density of one observation in the parametric family {p^ , <^ G 
Suppose that this model satisfy Assumption [3] and the Pollard's conditions (P-[T]l-(P-|4ll with 
/3 denoting the true parameter and with F = P(AA^) equal to the Fisher information 
matrix at parameter (3. We may thus apply Theorem \4\ and Theorem |6] successively to the 
minimizing sequence 

= ArgminAIC(0) . 
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We obtain /3„ ~^ (3 and y/n{f3^ — (3) u, where u is defined as the minimizer of (l20l ) 

with Joo = J(^^ and t = 1. Observe that, in the limit penalty only the vanishing 
coordinates of the true parameter /3 are penalized. In other words, for a coordinate k such 
that = and only for such a coordinate, we have Uk = with positive probability. 
This property highlights the (well known) ability of the AIC criterion to correctly select the 
correct model. 

Finally we note that the AIC can easily be extended to a collection of contrast A„(0, t), 
where t is a positive penalty weight (the case t = 1 corresponding to the standard AIC). 
The solution path /3„(t) with minimizes A„((^, t) for all t > is not more difficult to 
compute than once one has minimized M„(0) for the 2^ possible submodels. One 

easily sees that the solution path is piece-wise constant with multiple solutions at the dis- 
continuities. Multiple solutions for a finite set of penalty weight t are also present in the 
limit contrast (l20b with J^o = ■ One can show that there exists almost surely a unique 
minimizer u{t) of the limit contrast L(-, t) for all t G T if and only if the closure of the 
set T has zero Lebesgue measure. In the latter case, one also has that u satisfies Condi- 
tion ^ in Theorem [3] This non-uniqueness problem of the minimizer of the limit contrast 
did not appear in the previous examples because for both £^ and i'^ penalties, the limit con- 
trast was strictly convex. This is non-longer true for the £^ penalty so that the convergence 
V^iPn — P) u cannot hold in a functional sense in this case. Nevertheless, the conver- 
gence continues to hold for the i'^ penalty in the sense of the finite-dimensional convergence 
because, for a given finite number of penalty weights t, there is a unique minimizer ii(t) of 
L(-,t) almost surely. 



6. Conclusion 

We extended the works of Knight and Fu (2000) in several ways by showing that the 
asymptotic distribution that they exhibited for the penalized least squared continues to hold 
I) for the solution path in a functional sense 2) for a wide variety of contrasts extending 
the least squares case. We provided several examples of interest. An interesting feature 
of penalized estimation is that the form of the limit distribution of the regularization path 
only depends on the penalty since for any standard contrast, it is given as the path minimiz- 
ing (l20l ) with Joo only depending on the penalty. The marginal limit distribution is discussed 
in Knight and Fu (2000) for P penalties with 7 > 0. As pointed out in this reference, a 
particular feature of £^ penalty is that the limit distribution is compatible with model selec- 
tion properties but introduce an additional bias on the non-vanishing components. We have 
shown that the model selection property is preserved by the penalty, without introducing 
an additional bias on the non-vanishing components. However, the £^ penalty is much less 
numerically tractable for a large dimension of the parameter space and the central limit the- 
orem on the solution path only holds in a finite-dimensional sense. This latter result were 
derived in Section [5]for the AIC in the i.i.d. case. A similar analysis can clearly be carried 
out for the AIC applied to time series models or for Mallow's Cp criterion. 
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Appendix: detailed proofs. 

Proof of Theorem\3\in the case (C^. R ecall that, in the case (C-B, we set $ = and 



V = i°°{T,W). By Theorem 1.5.4 in IVan der Vaart and Wellned 11996], to show that u 



admits a version in V with Un ~^ u in V, it is sufficient to show that the finite-dimensional 
distributions of u„ converge to those of u and that is asymptotically tight. The con- 
vergence of the finite-dimensional distributions follows from the case (C-dJ that we already 
proved. Hence to conclude the proof in the case (C-|2ll, it only remains to show that {un) is 
asymptotically tight. In the following we show that this uniform tightness is inherited from 
that of (L„) in i°°{K x T). Asymptotic tightness follows from an equicontinuity criterion. 
The proof has now two steps. In Step 1, we construct a metric p on T based on a metric p 
that makes L„ asymptotically uniformly equicontinuous. In Step 2 we use the metric p to 
prove an equicontinuity criterion for {t„. 



Step 1 . By successively applying Lemma 1.3.8 and Theorem 1.5.7 in lVan der Vaart and Wellner 



lll996h . Condition ^ implies that, for any compact set K C M*', L„ is asymptotically tight 



in 1°°{K X T) and there exists a semi-metric p on K x T such that (K x T, p) is totally 
bounded and L„ is asymptotically uniformly p-equicontinuous in probability. This means 
that, for any e, a > 0, there exists 5 > such that 

limsupP* f sup |L„(u) - L„(u')| > a ) < e , (40) 
\(u,u')G5^(/s:) / 

where 

55(i^) = {((0,t),((/,',tO)G(i^xT)2 : p(((/,,t),(0',t'))<<5} . 

Clearly, the semi-metric p can be assumed to be bounded and not to depend on the compact 
set K without loss of generality; in other words, a bounded semi-metric p can be defined 
on X T so that {W x T, p) is totally bounded and L„ is asymptotically uniformly 
/3-equicontinuous in probability on K xT for any compact set K. We shall use this semi- 
metric in the following to show that it„ is asymptotically uniformly p-equicontinuous in 
probability, where p is the semi-metric defined on T by 

p(t,t') = sup p((0,t),(0,t')) . 



By BVan der Vaart and Wellneil[l99^. Theorem 1.5.7], the asymptotic uniform p-equicontinuity 



in probability implies that {un) is asymptotically tight. 

Step 2. It now remains to show that is asymptotically uniformly p-equicontinuous 
in probability. Let rj and e be two arbitrarily small positive numbers. By Conditions 
and (Hvjl, we may choose a compact K dW such that 

P{B)<e and limsupP*(P„) < e , (41) 

where 

B = {u{t) G K for all t G T}'^ and P„ = {«„(t) G K for all t G T}"^ . 
Using Condition dull, we may find a > arbitrarily small such that 

P ^inf [inf {L(0,t) : G ||(^ - u(t)|| > 77/2} - L(w(t), t)] < 40^ < e . (42) 
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We further choose S > so that Inequality (l40l ) holds, that is 

limsupP*(^„) < e , (43) 

where 

En = I sup |L„(u) - L„(u')| >a \ . 

[(u,u')eSsiK) J 

Finally, Condition <^ gives that 

limsupP*(C„) = , (44) 

where 



Cn = { sup < L„(2„(t), t) - inf L„((/), t) > > a 
LtGT L <^e* J _^ 

OnS^, we notice that ((u„(t'),t), (ii„(t'),t')) G ^^(i^) for every (t,t') such that p(t, t') < 
6. Hence, on n E'^, we have 

p{t,t') < 5 ^ hn{Un{t'),t) <hn{Un{t'),t') + a . (45) 

Suppose for a moment that we are on the set 

Dn={ sup \\un{t) - Un{t')\\ > ri\ . 
[p{t,t')«5 J 

Then we may find (t,t') e such that p{t,t') < 6 and ||fin(t) - Wn(t')|| > On 
we further have L„(ii„(t'), t') < inf^^g^ L„(0, t') + a. Intersecting with i?^ n E"^ and 
applying (|45] ). we obtain 

L„(2„(t'),t) < inf L„((/),t') + 2a<L„(2„(t),t') + 2a<L„(2„(t),t) + 3a, 

where the last inequality is obtained by exchanging t with t' in (|45] ). Applying again that 
we are on C^, we have L„(2„(t), t) < inf^g^ lLn(<^, t) +a, and thus, with the last display, 
we get 

max (Ln(u„(t),t),L„(u„(t'),t)) < inf L„(0,t) + 4a < inf^L„((^,t) + 4a . 

Since ||Sn(t) — u„(t')|| > rj and Wn(t) and itn(t') belong to K on i?^, we just proved that 
£>„ n n n E^ is included in 



Fn = { inf 

tGT 



inf max (L„(0, t), L„(0', t)) — inf L„((^,t") 
i4>,4>')&Br,{K) 4>eK 



< 4a 



where 

e^(if) = {(0,<^')ei^' : ||0-0'|| >7?} . 
Using Condition (Ql and the continuous mapping Theorem, we have limsupP*(F„) < 
P{F) , where 

F=iinf inf max (L(0, t), L(<A', t)) - inf L(0, t) < 4a i . 
[teT l{ciy,4>')eB„{K) ^ ' <peK J J 

Since £>„ n n -B^ n E^ C F„, using (HB, (|43]l and dSll, we further obtain 

limsupP*(Z)„) < + 2e . 
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Observe that for all ((/),(/>') G I3^{K) and t G T, we have ||(^ - it(t)|| > 77/2 or ||(^' - 
u(t)\\ > r]/2. Hence, for all t G T, 

inf max(L„((^,t),L„((^',t)) > inf |L(0",t) : G i^, ||0" - 2(t)|| > 77/2) 

Further, by definition of B, we have on B'^ that for all t G T, inf^gi^ IL'(0, t) < L(ti(t), t). 
This and the last display show that F n B'^ is included in 

jinf [inf{L(0,t) : cf) e K, 110 - e(t)|| > r?/2} - L(fi(t), t)] < 4a| , 



which, by (142]) . has probability at most e for our choice of a. Since K has been chosen so 
that P{B) <e,we finally get 

limsupP*(D„) < 4e . 

This exactly says that is asymptotically uniformly /o-equicontinuous in probability and 
the proof is achieved. □ 

Proof of Theorem\5\ Let e > and denote by B' = {4> : \\<p - /3|| < 2e} and B = {4> : 
110 — < e} the balls centered at /3 with radii 2e and e. We choose e small enough so that 
B' C V. We first show that Assumption [3]holds for M defined on $ by 

MW^h-^l , 'f'*^^- ,46) 
[A{/3)+a/2 otherwise, 



a = inf A(0) - A(/3) > . (47) 

ct>eB'\B 



where 



The positiveness of a follows from the strict convexity of A and Assumption |4l-(in]l. As- 
sumption [3l-(ini) follows from Assumption IJ]-©. Assumption [Bl-dniT) follows from the strict 
convexity of A, Assumption [11- © and the definitio n of M in (l46l) . It only remains to prove 
that Assumption [S-dill holds. By ^Rockafen^.[T970. Theorem 10.8 ] and arguing as in the 
proof of Lemma 3 in Niem iro [1992] for getting the result in the sense of the convergence 
in probabiUty, the pointwise convergence in Assumption H]-© implies the uniform conver- 
gence on the compact set B', that is, 

sup |M„(0)- A(0)| -^0. (48) 
ct>eB' 

Let be a probability 1 set on which M„ is convex and define 

An = \ sup |M„(0) - A(0)| < a/4 i n f^' . 
[0eB' J 

The set An is measurable since AX„ and A are convex on $ and thus the sup can be replaced 
by a sup on a countable dense subset of B' without changing the definition of An- Let 

a; G An. For all (f) £ B' \ B and t G [0, L], we have Mn(uj, 4>) > A(0) - a/4, A(0) > 
A(/3) + a, and, since (3 G B', A(/3) > M„(u;, (3) - a/4. Hence 

inf Mn(oj, 0) > Mn(oJ, 13) + a/2 . 

(t)&B'\B 
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By convexity of the function Mn{uj, •) and of the set the last display implies that 

inf M„(a;,0) > M„(w,/3) +a/2 . 

For all u> E An, using the definition of M in (l46l ). we thus have, for all G $ \ i?, 

{M((/,) - Mniiu, 4>)}+ = {A(/3) + a/2 - M„(c^, (^)}+ < |A(/3) + M„(cu,/3)| . 

Using this with (l48l) and P(^n) — ^ 1> we get Assumption IS]-©. We conclude that Assump- 
tion |3]holds and we obtain Assertion as an application of Theorem |4] 

Next we show Assertion © and thus assume that J„ is strictly convex. The proof of 
Assertion (jcj is similar and thus omitted. We set 

r - " 
" " 4J„(/3) ' 

so that — > cxD by assumption on Jn(/3) and tJn{P) < a/A for all t < L.„. Let lo G 
An- Then, for all cj) ^ B' \ B and t G [0, using that An{uj,cf),t) > Mn{uj,<p) and 
Mniuj,(3) = An{io,p,t) -tJniP) > An (w, /3) - a/4, we obtain 

inf inf AJuj, (j),t) > AJuj, 3,t) + a/i . 
te[o,L„] (/)eB'\-B 

Since J„ is strictly convex, so is the function A„(l<j, - ,1) for t > 0. By convexity of the 
set the previous display implies that for all t G [0, the minimum of A„(lij, 4>, t) on 
</) G $ is attained within B. By strict convexity of J„, this minimum is unique for t > 
and we let /3^(cj,t) be this unique minimum for t G (0, For u G ^1^ or t > L„, we 
define /3„(a;,t) = (f)Q, where cf)^ is any fixed point of As for t = and lo G A„, we 
define 

A„(a;,0)=limmf^„(t)Gi?, 

where the lim inf is defined component-wise in a given coordinate system of the Euclidean 
space containing Since the minimum of A„(w,<^,t) on G $ is attained within the 
compact set B, by continuity of Jn{4>) ^i^d Mn{uj,cf)) in 4>, /3„(u;,0) is a minimizer of 
A„(ll', 0, 0) on G Thus, we have defined a r.v. /3„(-,t) for any t > 0, for which 
Property (IbTT ) holds. 

To conclude the proof, we show that Property (Ib2l) holds. The continuity on (L„, cxd) for 
uj G yl„ and on M_|_ for u ^ A^ directly follows from the definition of (3„{uj,t). Let us 
now prove that /3„(u;, •) is continuous on (0, L„] for all u) G An. Since J„ is convex, it is 
bounded on B and since t) G B, we have suptg(o,L„] Jnifini^^ *)) < sup Jn{B) < 

oo. Let t and to be in (0, L„]. We have 

A„(3„(u;,t),to) < A„(3„(w,t),t) + |to -t| sup JniB) 

< A„(3„(w,to),t) + |to - t| supJ„(B) 

< A„(^„(w,to),to) + 2|to -t| supJn{B) . 

SinceA„(^„(w,to),to) < A„(^„(w,t),to)^wegetthat A„(3„(a;,t),to) A„(3„(u;,to),to) 
as t ^ to. Since, by strict convexity of A„, Pni^^ ^o) is an isolated minimum of A„(-, to), 
this implies that /3„(a;,t) — > /3„(a;,to) as t ^ to. The continuity of /3„(w, •) on (0, L„] 
follows and the proof is achieved. □ 
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a = inf 

teT 

By (liiil ). we have a > 0. Denote 

An - 



inf A(0;t)-A(/3(t),t) 

d{cf>,f3)>e/2 



supd(/3„(t),/3(t)) >e[> 
teT 



For all Lo G An, there exists teT such that d(/3„(LL', t), /3(t)) > e/2, and thus for which 
A(/3„(u;,t),t) - A(/3(t), t) > a. Hence, for all uj G 74„, we have 



sup 
teT 



> a . 



A(/3„(a;,t),t)-A(/3(t),t) 
Now we write, for any to € T, 

A(3„(to),to) - A(/3(to),to) = {A(3„(to),to) - A„(^„(to),to)} 

+ {A„(3„(to), to) - A„(/3(to), to)} + {A„(/3(to), to) - A(/3(to), to)} 

< sup sup {A(0; t) - An{4>, t)}_^ 
(/)e* teT 

+ sup|A„(3„(t),t) -A„(/3(t),t)| +sup|A„(/3(t),t) -A(/3(t),t)| . 
teT + teT 

Taking the sup in to G T we obtain that An C Al^^ U A^h^ U A^\ where A^^ = 

(2) (3) 

{sup0g$ suptgT {^(<^; t) — ^n(0jt)}_,_ > a/3}, and where An and vln are defined 
accordingly by using the last 2 lines of the last display. Applying P*{An) < P*{An^) + 
P* {A^n^ ) + P* {A'h^), ©, (El) and we thus get which achieves the proof. □ 

Proof of Theorem^ We apply Proposition [T] with A„ defined by ([Hi, A{cf), t) = M{4>) and 
/3(t) = /3 for all t. Let us check the conditions in Proposition [T] Since J„ is non-negative, 

{A(0; t) - A„(0; t)}+ < {M(0) - M„(</.)}+ , 

and Condition ^ follows from Assumption[3l-(Iill. Condition ^ follows from Assumption[3]- 
^ and J„(/3) —>■ 0. Conditions (lull) and (O directly follow from Assumption [3l-(liiil) and 
Eq. ([121 ). respectively. Hence ([T3l) follows from ([26l ). □ 

Proof of Proposition^ Denote the left-hand side of ([32l ) by C/„ and the left-hand side 
of (l29l ) by Let (5 > 1 and define ^„ = {[/„ > 5}. Then for all a; G we have 



sup 
teT 



G„(/3Jt),t) 



< 2n 



-1 



(49) 



By (luib . using the assumed uniform P* -consistency (1261 ). there exist non-negative random 
sequences Wn and Wn such that = op*(l), Wn = Op*(l) and 



n sup 
teT 



i?„(/3„(t),t) < (C/„u;„ + H^„) , 

teT 

hence, for all oj G An, 

n SUp|d(3„(t),/3(t)) i?„(^„(t),t)|| < Un (Un Wn + Wn) < K + Wn/5) 

teT 
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Denote the left-hand side of ([27]) by Sn- The last display, (|49ll and (l28]l imply that, for all 
to £ An and all t G T, 

Define Bn = {suptg-p (f(/3„(t), /3(t)) > e} where e is the positive number in Condition dull 
and denote the left-hand side of (l30l ) by a, which is positive. Then, for all lo G i?,^, a < 
n suptg-p //(/3„(t), t), and, using the previous display, if moreover uj G A^, 

aUl<nSn + Ul {25-^ K + + Wn/5] . 

Using that P*(5„) ^ 0, n5„ = Op.(l), K = Op.(l), w;„ = op*(l) and W„ = Op*(l), 
we easily get that limsup P*{An) can be made arbitrarily small by taking 6 large enough. 
Hence (EUl holds. □ 



Proof of Theorem^ Let us define Un = ^/n(f3n — (3) and 

L„(0,t) =n{A„(/3(t) + n-i/20,t)-A„(/3(t),t)} . (50) 

We will apply Theorem |3] with these definitions (in the case (C-|2ll) and thus now proceed 
in checking the conditions of Theorem [3] successively. Let iiT be a compact subset of 
Using (HUl, (l37l) and dSO]), we get 

L„(<^, t) = Gn{(t>, t) + nH (/3(t) + n-i/2<^, t) + ^U\\Rn (/3(t) + n-'/^cf), t 

Observe that by (l34l) and (l35l) . as functions of (0, t). 



Applying (I36] ). we obtain 

sup Rn{(3{t)+n-''^ct),t) =op.(l). 

(0,t)eii'xT ^ ^ 

Hence using that G„ G in 1°°{K x T,IRP), the three last displays yield L„ --^ L in 
i°^{K X T,W). Since G is tight in 1°^{K x T,W) by assumption, L also is and thus 
Condition @ holds. Conditions ^ and (Iml ) hold by assumption. Applying Proposition |2j 
we obtain (l32l ) and thus Condition (jlvll holds. Using (|33] ) with the above definitions, we get 
that Condition ([v]! holds. □ 

Proof of Theorem^ We shall apply Theorem|7]for A„ given by ^ and with /3(t) = /3 for 
all t G T. Let us check that the assumptions of this theorem hold in this context. Condi- 
tion (|33] ) and the uniform P* -consistency (l26l ) hold by assumption. The decomposition (|28] ) 
holds with 

G„(0,t) = (0-/3)^P„A + t(J„((^)-J„(/3)) 1(110-/311 < 1) , 
i/(0, t) = Pg{; 0) - Pg{;l3) - (0 - /3)^PA , 

P,(<^, t) = n-'/^Vn r(-, 0) + t||0 - (3\\-' {Jn{cj}) - UP)) HH - (3\\ > 1) . 

Using (P-d) and (P-|3]l, we have Y2=i ^i^k) = Op{n'/'^) and, using ([HI), we get that 
Condition (01 in Theorem|7]holds. Observe that H{(f), t) does not depend on t and, by (P-O, 
we have 

i/(0,t) = M(0)-M(/3). 
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Integrating x with respect to P in ([T6l) and using (P-IH), we get that the first derivative of M 
at (3 is zero and, by (P-O, 

t) = (0 - /3f r(</. - /3) + o (110 - /3f) . 
Hence Condition ([111) in Theorem |7] holds. 



We have, for any sequence of positive r.v. (r„) such that r„ 

1 + 

< = — sup 



sup 

Il0-,a||<r„ 



n 



1 + V^||0-/3|| 



= op(n"^/^) + op(r„) , 

where the last equality follows from (P-IHl. Observing that, for ||(^ — < r„ and r„ < 1 
the second term defining i?„ vanishes, we obtain Condition (l36l ) in Theorem |7] 
Defining Gn as in (|37] ) gives 

Gn{ct>^) = (t>^ iV^PnA) + t [n J„(/3 + _ „ j^(^) . 

Using (P-dJ and (P-|3]l, we have that y^P„A converge in distribution to W and, by ([T9l ). for 
any compact A" C G„ -w G in x T,MP), where G((?!),t) = (fi^W + t Joo(0)- 

This definition of G and (EUl gives ([201). Hence Theorem |7] yields (EJ- □ 

Proof of Lemma\T} We have, for all G R^', 

Ei<A.r-Ei/3fer <c(ii0-/3ir + ii0-/3|i) , 
fe=i fc=i 

where G only depends on /3 and 7 > 0. The bound 
7 < 1, one obtains 



follows directly for 7 > 1. For 



n 



and (I23] ) follows by oberving that a'^ < 1 + a for a > 0, and n'*'/^ < n^/^. 

Relation (|24l) is easily obtained by using the Taylor expansion, valid for x 7^ 0, Ix+yp = 
Ixl"^ + 7|2;p~^ sgn(x) y + 0{y'^), which concludes the proof. □ 

Proof of TheoremU} As (p Mn^cf)) = ^ Ylk=iiyk ~ ^fc ^ convex function, we 

apply Theorem [21 In fact, by Assumption [T]-(|ill, M„ is strictly convex for n large enough, 
and hence the more precise Assertion 1^ applies. We now show that Assumption I!]-© 
holds. 



M„(0) - MrM = {cl>- f3fGn{cl) - /3) - -e^X„(<^ - p) 

n 



(51) 



where e„ = y„ — X„/3. Since 

M^lsnf = E [TV(e^X„X^e„)] = TV [X„X^] = 0{n) , 

by Assumption [T]-©, it comes — ;|e^X„(0 — (3) = Op{n~^^'^). And furthermore, by 
Assumption [l]-© : 

M„(</.) - MM {cj> - (3fG{<t} -P) = A(0) . 

Since C is positive-definite, A is strictly convex and Assumption IJ]-© holds. By definition 
of /9„(t), (fT2]) holds. Finally, the condition Jn(/3) holds, as the penalty is defined 
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by Jn(/3) = A„ 11/3 111, with || • ||i denoting the i'^ norm. The uniform consistency on every 
compact set follows as an application of Theorem [5] □ 

Proof of Theorem^ We apply Theorem |7] with T a compact subset of M+. By definition 
of /3„(t), condition (|33] ) holds. We just obtained uniform consistency in Theorem [T] Us- 
ing (ISTl ). we have the decomposition (|28] ) of A„(<^, t), with 

Gn{^,t) = -2n-^I^U^{4> - /3) + tA„ (||0||i - ||/3||i) , 

H{<t>,t) = - /3)^C(0 - /3) and i?„(0,t) = \\cj> - l3\\-\cj> - pfiC^ - - /3) , 

where Un = n~^/^X^e„ and = n~^/^, by Assumption l2l-(liiil). 

The sequence {Un} converges in distribution to [/ ~ AA(0, cj^C) by the Lindeberg- 
Feller theorem and Assumption |2l We have, for all cj) ^ MP and t G T, n|G„(0,t)| < 
Vn[/„||(/)-/3|| +tV^|||(/)||i - ||/3||i| < ||(^ - /3|| (Op(V^) + cVn), where c is a positive 
constant. Hence Gn satisfies (|29l ). 

Conditions (l34l ) and (l35l) on if are immediately verified by taking r(t) = C, for all 
t € T and using Assumption IS]-©. 

Observe that |ii„(0,t)| < p(C„ — C) ||(^ — /3|| where p(C„ — C) is the spectral radius 

of {Cn - C). Since C„ C7, p{Cn - C) = op(l) and 

sup{i?„(0,t), G 110 — /3|| < r„} = op{rn)- Condition (l36l ) on i?„ follows. 
As in (|37]| . we define 

p 

G„(0,t) = nG„(/3 + n-i/20,t) = -2C/J<^ + tn^/^ ^ { |/?, + n-i/2^j| - |/?, |} . 

i=i 

For any compact K C RP, let / map n € to f[u] £ £°°{K x T), defined by f[u]{4>, t) = 
u^cf). The map / is continuous and by the continuous mapping theorem, f{Un) converges 
to /([/) in X T). From this and ^ with 7 = 1, it follows that Gn converges to G 

in i°°{K X T), where 

p 

G(0,t) = -2[/^0 + t^{0,sgn(/?,) li^^.^o} + l'Ai|l{/3,=o}} • 
i=i 

By Assumption [U-dil) one has L(0,t) > ci||(/)|p + C2||0|| for all </> G RP and t G T, with 
ci > and C2 a finite random variable. Since L(0,t) = 0, we get > L(M(t),t) > 
ci||u(t)|p + C2||w(t)|| thus 2(t) < — ^. Condition dnjl of Theorem [3] follows immediately 
and so does Condition (Iml ) of Theorem[3j observing that L(0, t) is continuous in (0, t) and 
strictly convex in cf). The convergence Q follows as an application of Theorem |7] □ 
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